gh-150075: tar.addfile() doesn't set member offsets#150278
gh-150075: tar.addfile() doesn't set member offsets#150278grantlouisherman wants to merge 1 commit into
Conversation
…file Signed-off-by: grantlouisherman <grantlouisherman041@gmail.com>
| self.offset += len(buf) | ||
| # add original offset to block size | ||
| bufsize=self.copybufsize | ||
| tarinfo.offset_data = self.offset |
There was a problem hiding this comment.
hi there, this seems wrong. you are replacing information about where the file data starts with information about where the file header starts.
There was a problem hiding this comment.
So I could be incorrect in my understanding but the offset is showing where the current offset pointer is. Then we set the pointer for offset_data because now we have added the Len of the buff size which now points to the start of the data and then the offset on line 2378 adds the block size. So then when we add another file we start pointing at the end of the previous block.
|
Hi, I'm a Python dev with some free time on their hands. @grantlouisherman looking through the code for this file, tarfile.offset is available "When TarFile is opened for reading" https://github.com/python/cpython/blob/main/Lib/tarfile.py#L2832 it makes sense that inside the context of writing TarInfo objects to the archive, that offset will not be accessible on the objects themselves at that time. it's not necessarily a bug just because you can't access it, even if your LLM told you it thinks there is a bug. i see in your PR that you potentially copied some wrong attributes (copying offset into offset_data). think about the ramifications of that, and what that means for your test.. other ways to improve your tests are to assert against fixed, known values. otherwise your test can be reporting all value comparisons are correct but if they are all zeroes then it may be a false positive. i also noticed that this is your third attempt to open a pull request with these changes, the previous two being closed because you were not sure of the correct git operations to do to update your branch. learning Please reply with:
|
|
@openly-retro I respect that you probably know more than me but the mere fact that that is wrong I can’t take an LLM comment seriously when it starts with a delusion. I’d rather for @ethanfurman to confirm or deny these assertions that you made. |
sorry I missed that part, you can ignore points 2-4 in my original comment, I should direct those questions to the person who started the original issue. |
Original bug report:
Bug report
Bug description:
It appears that
tarfile.TarFile.addfile()sets neitheroffsetnoroffset_dataattributes in theTarInfoadded toself.members, when it's done adding the file. Members in a reopened TAR have correct values.Reproducer
Expected output
Actual output
CPython versions tested on:
3.15
Operating systems tested on:
Linux
Fix
I added the offset info for a given file based on the current offset of the file and then after the block size is calculated. I also included a test to validate the behavior